A New MPI Implementation for Cray SHMEM
نویسنده
چکیده
Previous implementations of MPICH using the Cray SHMEM interface existed for the Cray T3 series of machines, but these implementations were abandoned after the T3 series was discontinued. However, support for the Cray SHMEM programming interface has continued on other platforms, including commodity clusters built using the Quadrics QsNet network. In this paper, we describe a design for MPI that overcomes some of the limitations of the previous implementations. We compare the performance of the SHMEM MPI implementation with the native implementation for Quadrics QsNet. Results show that our implementation is faster for certain message sizes for some micro-benchmarks.
منابع مشابه
The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600
This paper compares the performance and scalability of SHMEM and MPI-2 one-sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were chosen to represent commonly used communication patterns with low contention (accessing distant messages, a circular right shift, a binary tree broadcast) to communication patterns with high contentio...
متن کاملOptimizing Cray MPI and SHMEM Software Stacks for Cray-XC Supercomputers based on Intel KNL Processors
HPC applications commonly use Message Passing Interface (MPI) and SHMEM programming models to achieve high performance in a portable manner. With the advent of the Intel MIC processor technology, hybrid programming models that involve the use of MPI/SHMEM along with threading models (such as OpenMP) are gaining traction. However, most current generation MPI implementations are not poised to off...
متن کاملCurrent State of the Cray MPT Software Stacks on the Cray XC Series Supercomputers
HPC applications heavily rely on Message Passing Interface (MPI) and SHMEM programming models to develop distributed memory parallel applications. This paper describes a set of new features and optimizations that have been introduced in Cray MPT software libraries to optimize the performance of scientific parallel applications on modern Cray XC series supercomputers. For Cray XC systems based o...
متن کاملPerformance analysis of asynchronous Jacobi's method implemented in MPI, SHMEM and OpenMP
Ever-increasing core counts create the need to develop parallel algorithms that avoid closelycoupled execution across all cores. In this paper we present performance analysis of several parallel asynchronous implementations of Jacobi’s method for solving systems of linear equations, using MPI, SHMEM and OpenMP. In particular we have solved systems of over 4 billion unknowns using up to 32,768 p...
متن کاملParallel Priority Queues on Cray - T
We examine the design, implementation, and experimental analysis of parallel priority queues for network simulation. We consider: a) distributed splay trees using MPI, b) concurrent heaps using shared memory atomic locks, and c) a new, more general concurrent data structure based on distributed sorted lists, which is designed to provide dynamically balanced work allocation (with automatic or ma...
متن کامل